Eecient Vocal Tract Normalization in Automatic Speech Recognition
نویسندگان
چکیده
In this paper we study the eeect of vocal tract normalization (VTN) on the word error rate (WER) in speaker independent large vocabulary speech recognition. Evaluation test results are reported for the German VerbMobil II (VM II) and the English Wall Street Journal (WSJ) corpus. In particular, we analyse: the eeect of the type of warping function (linear vs. non-linear) on the WER; diierent methods for estimating the warping factor in recognition; incremental warping factor estimation for single-pass online recognition; phoneme dependence of the warping factors. We nd that a simple piecewise linear warping function performs better than non-linear frequency warping. In recognition, a two-pass approach performs as good as supervised VTN on the reference transcription even if the WER of the rst recognition pass is of the order of 20..30%. Fast warping factor estimation with text independent models results in only a slight performance degradation but allows the system to run at the same speed as a single-pass recognizer without VTN. A minor improvement over baseline VTN is obtained with phoneme dependent warping factors.
منابع مشابه
تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملOn the Use of a Wave-Reflection Model for the Estimation of Spectral Effects due to Vocal Tract Length Changes with Application to Automatic Speech Recognition
Vocal tract length normalization (VTLN) is commonly used in state-of-the-art automatic speech recognition (ASR) systems to reduce the mismatch between speaker-dependent formant frequency scalings. Usually, the normalization is done by a piece-wise linear scaling of the filter bank center frequencies. The linear scaling is motivated by a uniform acoustic tube model that does not take any loss ef...
متن کاملImpact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...
متن کاملLow-dimensional, auditory feature vectors that improve vocal-tract-length normalization in automatic speech recognition
متن کامل
Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition
Vocal tract length normalization (VTLN) is commonly applied utterance-wise with a warping function that makes the assumption of a linear dependence between the vocal tract length and the location of the formants. In this work we propose a datadriven method for enhancing the performance of systems that already use standard VTLN. The method is based on elastic registration to estimate optimal non...
متن کامل